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Abstract - We present a technique for minimizing the power dissipated in a 
VLSI chip by lowering the operating voltage without any significant penalty in 
the chip throughput even though low voltage operation results in slower circuits. 
Since the overall throughput of a VLSI chip depends on the speed of the critical 
path(s) in the chip, it may be possible to sustain the throughput rates attained at 
higher voltages by operating the circuits in the critical path(s) with a high voltage 
while operating the other circuits with a lower voltage to minimize the power 
dissipation. The interface between the gates which operate at different voltages 
is crucial for low power dissipation since the interface may possibly have high 
static current dissipation thus negating the gains of the low voltage operation. 
We present the design of a voltage level translator which does the interface 
between the low voltage and high voltage circuits without any significant static 
dissipation. We then present the results of the mixed voltage design using a 
greedy algorithm on three chips for various operating voltages. 


1 Introduction 

The power consumed in CMOS VLSI primarily results from charging and discharging the 
node capacitances in the VLSI circuit. The CV 2 f power dissipated in a VLSI chip depends 
on the square of the operating voltage [3]. Thus operating a chip with 3V instead of 5V results 
in 64% power savings. Low voltage operation is one way of achieving low power in systems. 
Lower voltage operation also results in improved signal integrity and reliability due to lower 
crosstalk and ground bounce. Space systems that require low power with high reliability 
should find the migration to lower voltages a very attractive proposition. Lowering the 
operating voltage has an adverse effect on the speed of the circuits. Thus low voltage design 
reduces the overall chip throughput unless some other compensating measures are taken. 
In this paper, we propose a mixed voltage design technique which addresses the problem 
of lowering the power dissipation without affecting the chip throughput. The circuits in 
the critical path(s) are operated with the higher voltage to maintain the chip throughput 
while the circuits that are not in the critical paths are operated with the lower voltage to 
minimize the power dissipation. The interface between circuits that operate at different 
voltages can have large static current dissipation because the transistors in both the p-block 
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and the n-block may be on resulting in a conducting path from the power to the ground 
rail. We present a voltage level translator that can be used in the interface. The translator 
is implemented using 6 transistors with appropriate transistor sizing. The translator has an 
excellent DC transfer characteristic and fast propagation delays with very minimal static 
current dissipation (of the order of pico-amperes). We then present a greedy algorithm for 
doing the mixed voltage design. The algorithm is not optimal but gives very satisfactory 
results. Finally we present the results of the mixed voltage design algorithm on three chips 
that were designed at JPL for various flight projects using the Honeywell RICMOS process 

[4]- 

2 Mixed Voltage Design 

2.1 Low Voltage Operation 

The delay (td = 4 Cl/ f3V) of a VLSI circuit depends on the inverse of the operating voltage 
[3] while the power (P = CV 2 f) consumed by the circuit depends on the square of the 
voltage. It follows from this observation that if we decrease the operating voltage of a chip, 
then we must increase the clock cycle time to allow for the slower operation of the circuits 
on the chip. Increasing the chip clock cycle time implies a reduced overall chip throughput. 
A way to overcome the problem of reduced chip throughput for low voltage design is by 
architectural techniques like pipelining and parallelism as discussed in [2]. Two problems 
with the architectural solutions outlined in [2] is that they involve architectural redesign and 
they require more chip real-estate. Essentially, the techniques proposed in [2] trade off power 
for area. We propose a technique which does not require any architectural redesign. The 
technique relies on the fact that the chip clock cycle time (and throughput) depend on the 
speed of the critical path(s). The elements in the critical path(s) may be operated with a 
high voltage to ensure that there is no (or minimal) loss of chip throughput. The elements 
that are not in the critical paths may be operated with the lower voltage to ensure lower 
power dissipation. Let us consider two operating voltages Vddh and Vddi for operating a VLSI 
circuit where Vddh > Vddi • We begin with some definitions to illustrate the tradeoffs in mixed 
voltage design. 

Definition 1 The switching energy ratio rsE of a circuit with respect to two voltages Vddh 
and Vddi is the ratio of the energy required for switching the circuit with the lower voltage to 
the energy required for switching the circuit with the higher voltage. 

Definition 2 The switching delay ratio rsD of a circuit with respect to two voltages Vddh and 
Vddi is the ratio of the switching delay of the circuit with the lower voltage to the switching 
delay of the circuit with the higher voltage. 

Definition 3 The chip energy ratio rcE of a chip with respect to two voltages Vddh and Vddi 
is the ratio of the switching energy of the chip with a mix pf Vddh and Vddi operation to the 
switching energy of the chip with Vddh operation alone. 
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Definition 4 The chip delay ratio rcD of a chip with respect to two voltages Vddh and Vddi 
is the ratio of the fastest chip clock cycle time with a mix of Vddh and Vddi operation to the 
fastest chip clock cycle time with Vddh operation alone. 

Definition 5 The MVD ratio r^VD of a chip is the ratio of swtching energy ratio and the 
chip energy ratio. 


Given the expression for the circuit delay and power dissipation, we have the following 

relationships: 


tse = 
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The chip delay ratio rcD is a parameter that should be determined by the system architect. 
The parameter will depend on how much loss of throughput can be tolerated by the chip 
when we go from a high voltage operation to a mixed voltage operation. For example, if a 
compression chip is designed such that its maximum path delay is 87 ns but the chip needs 
to handle input data rates of 10 MHz, then the chip can withstand a chip delay ratio of 
approximately 1.15. Once an acceptable value for the chip delay ratio has been determined 
by the system architect, the technique of mixed voltage design may used to lower the chip 
power dissipation. The reduced power of the chip will result in a chip energy ratio which is 
a function of the switching delay ratio and chip delay ratio: 

ncE = f(rsD,rcv) 

It should be noted that the switching energy ratio is dependent on the switching delay ratio: 

rsE 


1/r 2 


SD 


2.2 Voltage Level Translator 

In a chip with mixed voltage design, a gate that operates with the higher voltage Vddh may 
receive inputs from gates that operate with the lower voltage Vddi ■ In such a case, the gate 
input swings between V ss and Vddi- The input signal may produce a conducting path between 
Vddh and V ss in the gate when it is at Vddi since both the p-transistors and the n-transistors 
in the path may be on continuously. This causes a large static current dissipation in the 
gate which may negate the gains of low voltage operation. Hence, we need a voltage level 
translator to translate the low voltage signals produced by gates that operate with Vddi to the 
high voltage signals that can be applied as inputs to the gate operated with Vddh ■ It should 
be noted that the reverse voltage level translator to translate high voltage signals to low 
voltage signals is not required since high voltage signals driving a gate operated at a lower 
voltage can never produce a conducting path between the power and ground rails (other than 
the transient conducting path that produces the short-circuit current found in conventional 
CMOS). We have designed a voltage level translator using 6 transistors which allows us 
to convert low voltage signals (signal swing between V ss and Vddi) to high voltage signals 
(signal swing between and Vddh)- The desireable characteristics of such a translator are 
the following: 
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® The translator should have good DC characteristics for noise tolerance. 

® The translator should have reasonably fast propagation delays. 

® The translator should have no (or minimal) static dissipation. 

® The translator should not be too expensive in terms of VLSI real-estate. 

® The translator should not require any special processing for VLSI fabrication. 


Vddh 



Figure 1: Voltage level translator 

The voltage level translator is shown in figure 1. The operation of the translator is as follows: 

• Asssume that the input to the translator V n has been steady at the logic level 0 which 
implies that V{ n = K s . In this state, transistor m2 is turned off while transistor m4 is 
turned on. Since m4 is on, the output V out is pulled low. Since V ou t is fed back to the 
input of ml, transistor ml is on which implies that the gate of transistor m3 is pulled 
to Vddh ■ Thus, transistor m3 is off which allows the output V out to remain at V ss . It 
should be noted that there is no conducting path from Vddh to V ss in this state. 

• When the input to the translator switches from V ss to Vddh then transistor m2 is turned 
on while transistor m4 is turned off. Since transistor ml is on at the same time, there 
is a conducting path from Vddh to V ss which leads to a flow of short-circuit current. 
However, if transistor m2 is larger than transistor mf, then it will overpower transistor 
ml and discharge the gate input of transistor m3 thus forcing m3 to turn on. When m3 
turns on, the output V out is pulled to Vddh which turns off transistor ml, thus allowing 
the translator to settle into a stable state with V out — Vddh • There is no conducting 
path from Vddh to V ss in the stable state thus eliminating the possibility of any static 
current dissipation. 
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• When the translator input changes back to V ss , transistor m2 is turned off while tran- 
sistor mJf. is turned on. Since transistor m3 is on at the same time, there is a conducting 
path from Vddh to V ss which leads to a flow of short-circuit current. However, if transis- 
tor m4 is larger than transistor m3, then it will overpower transistor m3 and discharge 
the gate input of transistor ml thus forcing transistor ml to turn on. V out conse- 
quently goes low. When ml turns on, the gate input of transistor m3 is pulled to Vddh 
which turns off transistor m3 thus allowing the translator to settle in the stable state 
with V out — . There is no conducting path from Vddh to V ss in the stable state thus 

eliminating the possibility of any static current dissipation. 



Figure 2: Pulse response 


Figure 2 shows the response of the voltage level translator to a 5 ns pulse. The simulations 
were done using SPICE with models for 2 p devices from [1]. A critical parameter in the 
design of the voltage level translator is the sizing of transistors m2 and m4 relative to ml 
and m3. For the simulations, we sized the transistors such that the W/L ratios of m2 and 
m4 were 5 times those of ml and m3. This ratio is sufficient for attaining a reasonably fast 
propagation delay as shown in figure 2. The rise time delay in the simulation was 1.08 ns and 
the fall time delay was 1.18 ns. Figure 3 shows the DC transfer characteristic of the voltage 
level translator. The figure shows that the translator output has a very sharp transition 
when the input voltage is approximately 1.4V (assuming that the translator voltage levels 
are 3V and 5V). The sharpnesss of the response implies a very good noise immunity. The 
static current dissipation in the translator was 13.9 pA. 
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Figure 3: DC transfer characteristic 

2.3 Optimization Problem 

For the purpose of illustrating the optimization problem in mixed voltage design, we shall 
assume a standard cell based design. Each cell may have a number of inputs and outputs 
and may occur on a number of different paths in the chip. We denote the delay presented 
by cell i to path j by dij. Assume that each cell i has a weight u> 4 - which is proportional to 
the internal capacitance of the cell. The weight of a cell is used to select between two cells 
for the optimization algorithm since a cell with a higher weight will lead to greater power 
savings if the cell is operated with the lower voltage. A path is essentially an ordered set of 
cells which occur on the path. Let us denote the set of cells representing the path j by Sj. 
If we use D c to denote the chip critical path delay when the entire chip is operated at Vuh , 
then the optimization problem can be cast as follows: 

Maximize J2i=i $i w i 

subject to £ ie5> (l - Si + 8ir SD )di j < r C DD c Vj 

where 

^ _f 1 if cell i is operated with lower voltage 
[ 0 otherwise 

3 Experimental Results 

3.1 Chip Benchmarks 

We considered three chip designs that were done at JPL for various flight projects as bench- 
marks for our study of power optimization using mixed voltage design. The chips were 
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designed using the Honeywell RICMOS process [4] by three different VLSI designers and 
consequently had very different path length distributions. The chips were approximately of 
the same size and various statistics on the chips are shown in table 1. The table shows the 



chipO 

chipl 

chip2 

Tlpaths 

2699 

2846 

4123 

H'cells 

1936 

2121 

2038 

H regs 

491 

552 

519 


Table 1: Chip statistics 

total number of paths, cells and registers in each of the chips. The cells used in the chip 
designs ranged in complexity from a simple inverter to a 4-to-l multiplexer. The designs had 
a total of 39 different types of cells. The weight Wi of a cell was assumed to be the number 
of gate equivalents used by the cell [4]. Figures 4-6 show the distribution of path lengths 
in the three chip designs. The length of a path is the total number of cells in a path. The 
cumulative frequency (on the y-axis) is the total fraction of paths whose length exceed that 
of the path length shown on the x-axis. The figures and table 1 show that one of the chips 
( chip2 ) was significantly different from the other two chips. The largest path length in chip2 
was 34 while the largest path length for the other designs was 18. Also, chip2 had a very 
large number of paths compared to the other chips inpite of the fact that all the chips had 
about the same number of cells and registers. 


Cumulative 



Figure 4: Path length distributions in chipQ 
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Figure 5: Path length distributions in chip 1 


3.2 Greedy Optimization Strategy 

We adopt a greedy optimization algorithm for mixed voltage design. The input to the 
algorithm is an adjacency list representation of the chip design. Each list on the adjacency 
list represents a path. We sort the cells by their weights and then traverse down the sorted 
list of cells to see which cells can have their voltage lowered to Vddi- The voltage of a cell 
can be lowered to Vddi if the delays of all the paths that contain the cell are still bounded 
by tcdD c when we lower the voltage of the cell to Vddi- It should be apparent to the reader 
that a greedy strategy on the cell weights is not optimal. To prove the non-optimality of 
the greedy strategy, consider three cells i 1 j, and k on path Si. If the weights of the cells are 
such that Wi > Wj > Wk , then during the execution of the greedy algorithm, cell i will have 
its voltage lowered to Vddi • However, doing so may preclude cells j and k from having their 
voltages lowered because we may violate the timing constraint if the delay of Sj exceeds 
tcdDc- On the other hand, if we had lowered the voltages of j and k first, we may have still 
stayed within the bounds imposed by the timing constraint and yet have attained a lower 
power dissipation. It is easy to see that this will be the case if the following conditions hold: 

dji -}- dki dn 

Wj + Wk> Wi 

One may consider other types of greedy strategies (involving greed on the ratio of the cell 
weight and the cell delay) or other approaches using dynamic programming. The greedy 
algorithm (which is greedy on the cell weigths) gives us very good results for power savings 
and we did not investigate the other greedy approaches or dynamic programming approaches 
for this paper. We plan to look into the other approaches for future research. 
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Figure 6: Path length distributions in chip2 


3.3 Power Savings 

Figures 7-9 show the results of the greedy strategy on the three chip benchmarks. The 
figures show the MVD ratio ( tmvd ) as a function of the chip delay ratio {tcd) for various 
values of the switching delay ratio tsd • The MVD ratio should be a monotonically increasing 
function of the chip delay ratio. The non-monotonicity of the MVD ratio as shown in the 
figures is an artifact of the non-optimality of the greedy algorithm. The power savings is 
given by (1 — l/r1 D rMVD ) X 100%. It can be seen that all the chip designs have a power 
savings of 65-70% without any significant penalty in the chip throughput (r C D = 1) and 
that all the designs have a power savings of about 83% if the chips are allowed to be slowed 
down by a factor of 2. The MVD ratio is almost a linear function of the chip delay ratio 
which implies that the power dissipated is inversely proportional to the chip delay ratio. A 
higher value of rsD (as a consequence of lower Vddi) results in lower power dissipation up 
to a certain point after which the power dissipated is constant. For example, for chipO , the 
power saved is approximately 65% (for chip delay ratios of rsD = 2.0 and rsD = 2.5) which 
implies that any futher increase of r S n by reducing Vddi will not result in any more power 
savings. In fact since the power savings is identical for both values of rsD , the designer may 
prefer operating the chip with rsD = 2.0 since the MVD ratio tmvd is higher for rsD = 2.0 
as compared to the MVD ratio for tsd = 2.5. The higher MVD ratio implies less translators 
in the design though we have not quantified this in our study yet. Future work will address 
this relationship along with analytical models of the MVD ratio and the power savings that 
can result. 
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Figure 7: Mixed voltage design results for chipO 


4 Conclusions 

We have presented a technique for minimizing the power dissipation in CMOS VLSI by 
operating the VLSI chip with two voltages. The higher voltage is used to ensure that the 
chip throughput isn’t adversely affected by the low voltage operation while the lower voltage 
is used for the majority of the cells in the chip to ensure low power dissipation . The key to 
designing chips with two operating voltages is an innovative voltage level translator that we 
designed using 6 transistors that allows us to translate low voltage signals to high voltage 
signals without any significant static power dissipation. Results of using the mixed voltage 
design technique with a greedy optimization strategy were presented for three VLSI chips 
with considerably different designs showing that it is possible to save about 65-70% of the 
power dissipated in the chips without paying any penalty in the chip throughput. 
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